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Abstract: 

We investigate the self-organising behaviour of Digital Ecosystems, 
because a primary motivation for our research is to exploit the self- 
organising properties of biological ecosystems. We extended a definition 
for the complexity, grounded in the biological sciences, providing a 
measure of the information in an organisms genome. Next, we extended 
a definition for the stability, originating from the computer sciences, 
based upon convergence to an equilibrium distribution. Finally, we 
investigated a definition for the diversity, relative to the selection 
pressures provided by the user requests. We conclude with a summary 
and discussion of the achievements, including the experimental results. 
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1 Introduction 



Digital Ecosystems are distributed adaptive open socio-technical systems, 
with properties of self-organisation, scalability and sustainability, inspired by 
natural ecosystems (Briscoe, 2009, 2010; Briscoe and Sadedin, 2007; Briscoe 
et al., 2007), and are emerging as a novel approach to catalysing sustainable 
regional development driven by Small and Medium sized Enterprises (SMEs). 
Digital Ecosystems aim to help local economic actors become active players in 
globalisation, valorising their local culture and vocations, and enabling them to 
interact and create value networks at the global level (Dini et al., 2008; Stanley 
and Briscoe, 2010). 

Self-organisation is perhaps one of the most desirable features in the systems 
that we design, and a primary motivation for our research in Digital Ecosystems 
is the desire to exploit the self-organising properties of biological ecosystems 
(Levin, 1998), which are thought to be robust, scalable architectures that 
can automatically solve complex, dynamic problems. Over time a biological 
ecosystem becomes increasingly self-organised through the process of ecological 
succession (Begon et al., 1996), driven by the evolutionary self-organisation of the 
populations within the ecosystem. Analogously, a Digital Ecosystem's increasing 
self-organisation comes from the agent populations within being evolved to meet 
the dynamic selection pressures created by the requests from the user base. 
The self-organisation of biological ecosystems is often defined in terms of the 
complexity, stability, and diversity (King and Pimm, 1983), which we will also 
apply to our Digital Ecosystems. 

It is important for us to be able to understand, model, and define self- 
organising behaviour, determining macroscopic variables to characterise this 
self-organising behaviour of the order constructing processes within, the evolving 
agent populations (Briscoe and De Wilde, 2009 6, c, a). However, existing definitions 
of self-organisation may not be directly applicable, because evolving agent 
populations possess properties of both computing systems (e.g. agent systems) 
as well as biological systems (e.g. population dynamics), and the combination 
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Figure 1 Digital Ecosystem: Optimisation architecture in which agents (representing 
services) travel along the P2P connections; in every node (habitat) local 
optimisation is performed through an evolutionary algorithm, where the 
search space is determined by the agents present at the node. 

of these properties makes them unique. So, to determine definitions for the self- 
organising complexity, stability, and diversity we will start by considering our 
Digital Ecosystems and the available literature on self-organisation, for its general 
properties, its application to Mult i- Agent Systems (the dominant technology in 
Digital Ecosystems), and its application to our evolving agent populations. 



2 The Digital Ecosystem 

Our Digital Ecosystem (Briscoe and De Wilde, 2006, 2010) provides a two-level 
optimisation scheme inspired by natural ecosystems, in which a decentralised 
peer-to-peer network forms an underlying tier of distributed agents. These agents 
then feed a second optimisation level based on an evolutionary algorithm that 
operates locally on single habitats (peers), aiming to find solutions that satisfy 
locally relevant constraints. The local search is sped up through this twofold 
process, providing better local optima as the distributed optimisation provides 
prior sampling of the search space by making use of computations already 
performed in other peers with similar constraints. So, the Digital Ecosystem 
supports the automatic combining of numerous agents (which represent services), 
by their interaction in evolving populations to meet user requests for applications, 
in a scalable architecture of distributed interconnected habitats. The sharing of 
agents between habitats ensures the system is scalable, while maintaining a high 
evolutionary specialisation for each user. The network of interconnected habitats 
is equivalent to the abiotic environment of biological ecosystems; combined with 
the agents, the populations, the agent migration for distributed evolutionary 
computing, and the environmental selection pressures provided by the user base, 
then the union of the habitats creates the Digital Ecosystem, which is summarised 
in Figure 1. The continuous and varying user requests for applications provide a 
dynamic evolutionary pressure on the applications (agent-sequences), which have 
to evolve to better fulfil those user requests, and without which there would be no 
driving force to the evolutionary self-organisation of the Digital Ecosystem. 
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If we consider an example user base for the Digital Ecosystem, the use of 
Service- Oriented Architectures (SOAs) in its definition means that business-to- 
business (B2B) interaction scenarios lend themselves to being a potential user 
base for Digital Ecosystems. So, we can consider a business ecosystem of Small 
and Medium sized Enterprise (SME) networks (Moore, 1996), as a specific class 
of examples for B2B interaction scenarios; and in which the SME users are 
requesting and providing software services, represented as agents in the Digital 
Ecosystem, to fulfil the needs of their business processes, creating a Digital 
Business Ecosystem as shown in Figure 2. SOAs promise to provide potentially 
huge numbers of services that programmers can combine, via the standardised 
interfaces, to create increasingly more sophisticated and distributed applications. 
The Digital Ecosystem extends this concept with the automatic combining of 
available and applicable services, represented by agents, in a scalable architecture, 
to meet user requests for applications. These agents will recombine and evolve over 
time, constantly seeking to improve their effectiveness for the user base. From the 
SME users' point of view the Digital Ecosystem provides a network infrastructure 
where connected enterprises can advertise and search for services (real-world or 
software only), putting a particular emphasis on the composability of loosely 
coupled services and their optimisation to local and regional, needs and conditions. 
To support these SME users the Digital Ecosystem is satisfying the companies' 
business requirements by finding the most suitable services or combination of 
services (applications) available in the network. An application (composition of 
services) is defined be an agent-sequence in the habitat network that can move 
from one peer (company) to another, being hosted only in those where it is most 
useful in satisfying the SME users' business needs. 

The agents consist of an executable component and an ontological description. 
So, the Digital Ecosystem can be considered a Multi-Agent System (MAS) which 
uses distributed evolutionary computing to combine suitable agents in order to meet 
user requests for applications. 

The landscape, in energy-centric biological ecosystems, defines the connectivity 
between habitats. Connectivity of nodes in the digital world is generally not 
defined by geography or spatial proximity, but by information or semantic 
proximity. For example, connectivity in a peer-to-peer network is based primarily 
on bandwidth and information content, and not geography. The island- models 
of distributed evolutionary computing use an information-centric model for the 
connectivity of nodes (islands) (Lin et al., 1994). However, because it is generally 
defined for one-time use (to evolve a solution to one problem and then stop) it 
usually has a fixed connectivity between the nodes, and therefore a fixed topology. 
So, supporting evolution in the Digital Ecosystem, with a multi-objective selection 
pressure (fitness landscape with many peaks), requires a re-configurable network 
topology, such that habitat connectivity can be dynamically adapted based on 
the observed migration paths of the agents between the users within the habitat 
network. Based on the island-models of distributed evolutionary computing (Lin 
et al., 1994), each connection between the habitats is bi-directional and there is a 
probability associated with moving in either direction across the connection, with 
the connection probabilities affecting the rate of migration of the agents. However, 
additionally, the connection probabilities will be updated by the success or failure 
of agent migration using the concept of Hebbian learning: the habitats which do 
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not successfully exchange agents will become less strongly connected, and the 
habitats which do successfully exchange agents will achieve stronger connections. 
This leads to a topology that adapts over time, resulting in a network that 
supports and resembles the connectivity of the user base. If we consider a business 
ecosystem, network of SMEs, as an example user base; such business networks are 
typically small- world networks (White and Houseman, 2002). They have many 
strongly connected clusters (communities), called sub-networks (quasi-complete 
graphs), with a few connections between these clusters (communities) (Watts and 
Strogatz, 1998). Graphs with this topology have a very high clustering coefficient 
and small characteristic path lengths. So, the Digital Ecosystem will take on a 
topology similar to that of the user base, as shown in Figure 2. 

The novelty of our approach comes from the evolving populations being created 
in response to similar requests. So whereas in the island-models of distributed 
evolutionary computing there are multiple evolving populations in response to one 
request (Lin et al., 1994), here there are multiple evolving populations in response 
to similar requests. In our Digital Ecosystems different requests are evaluated on 
separate islands (populations), and so adaptation is accelerated by the sharing of 
solutions between evolving populations (islands), because they are working to solve 
similar requests (problems). 

The users will formulate queries to the Digital Ecosystem by creating a request 
as a semantic description, like those being used and developed in SOAs, specifying 
an application they desire and submitting it to their local peer (habitat). This 
description defines a metric for evaluating the fitness of a composition of agents, 
as a distance function between the semantic description of the request and the 
agents' ontological descriptions. A population is then instantiated in the user's 
habitat in response to the user's request, seeded from the agents available at 
their habitat. This allows the evolutionary optimisation to be accelerated in the 
following three ways: first, the habitat network provides a subset of the agents 
available globally, which is localised to the specific user it represents; second, 
making use of applications (agent-sequences) previously evolved in response to 
the user's earlier requests; and third, taking advantage of relevant applications 
evolved elsewhere in response to similar requests by other users. The population 
then proceeds to evolve the optimal application (agent-sequence) that fulfils the 
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Figure 2 Digital Business Ecosystem: Business ecosystem, network of SMEs (Moore, 
1996), using the Digital Ecosystem. The habitat clustering will therefore be 
parallel to the business sector communities. 





6 



G Briscoe and P Be Wilde 



user request, and as the agents are the base unit for evolution, it searches the 
available agent combination space. For an evolved agent-sequence (application) 
that is executed by the user, it then migrates to other peers (habitats) becoming 
hosted where it is useful, to combine with other agents in other populations to 
assist in responding to other user requests for applications. 



3 Self-Organisation 

Self-organisation has been around since the late 1940s (Ashby, 1947), but has 
escaped general formalisation despite many attempts (Nicolis and Prigogine, 1977; 
Kohonen, 1989). There have instead been many notions and definitions of self- 
organisation, useful within their different contexts (Heylighen, 2002). They have 
come from cybernetics (Ashby, 1947; Beer, 1966; Heylighen and Joslyn, 2001), 
thermodynamics (Nicolis and Prigogine, 1977), mathematics (Lendaris, 1964), 
information theory (Shalizi, 2001), synergetics (Haken, 1977), and other domains 
(Lehn, 1990). The term self- organising is widely used, but there is no generally 
accepted meaning, as the abundance of definitions would suggest. Therefore, the 
philosophy of self-organisation is complicated, because organisation has different 
meanings to different people. So, we would argue that any definition of self- 
organisation is context dependent, in the same way that a choice of statistical 
measure is dependent on the data being analysed. 

Proposing a definition for self-organisation faces the cybernetics problem of 
defining system, the cognitive problem of perspective, the philosophical problem 
of defining self, and the context dependent problem of defining organisation 
(Gershenson and Heylighen, 2003). 

The system in this context is an evolving agent population, with the replication 
of individuals from one generation to the next, the recombination of the 
individuals, and a selection pressure providing a differential fitness between the 
individuals, which is behaviour common to any evolving population (Begon et al., 
1996). 

Perspective can be defined as the perception of the observer in perceiving the 
self-organisation of a system (Ashby, 1962; Beer, 1966), matching the intuitive 
definition of / will know it when I see it (Shalizi and Shalizi, 2003), which despite 
making formalisation difficult shows that organisation is perspective dependent (i.e. 
relative to the context in which it occurs). In the context of an evolutionary system, 
the observer does not exist in the traditional sense, but is the selection pressure 
imposed by the environment, which selects individuals of the population over 
others based on their observable fitness. Therefore, consistent with the theoretical 
biology (Begon et al., 1996), in an evolutionary system the self-organisation of its 
population is from the perspective of its environment. 

Whether a system is ^/"-organising or being organised depends on whether 
the process causing the organisation is an internal component of the system under 
consideration. This intuitively makes sense, and therefore requires one to define 
the boundaries of the system being considered to determine if the force causing 
the organisation is internal or external to the system. For an evolving population 
the force leading to its organisation is the selection pressure acting upon it (Begon 
et al., 1996), which is formed by the environment of the population's existence 
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and competition between the individuals of the population (Begon et al., 1996). 
As these are internal components of an evolving agent population (Begon et al., 
1996), it is a self-organising system. 

Now that we have defined, for an evolving agent population, the system for 
which its organisation is context dependent, the perspective to which it is relative, 
and the self by which it is caused, a definition for its self- organisation can be 
considered. The context, an evolving agent population in its environment, lacks a 
2D or 3D metric space, so it is necessary to consider a visualisation in a more 
abstract form. We will let a single square, □, represent an agent, with colours 
to represent different agents. Agent-sequences will therefore be represented by a 
sequence of coloured squares, I I I I . with a population consisting of multiple agent- 
sequences, as shown in Figure 3. 

In Figure 3 the number of agents, in total and of each colour, is the same 
in both populations. However, the agent population on the left intuitively shows 
organisation through the uniformity of the colours across the agent-sequences, 
whereas the population to the right shows little or no organisation. Following 
biological ecosystems, which defines self-organisation in terms of the complexity, 
stability, and diversity relative to the perspective of the selection pressure (King 
and Pimm, 1983): the self-organised complexity of the system is the creation of 
coherent patterns and structures from the agents, the self-organised stability of 
the system is the resulting stability or instability that emerges over time in these 
coherent patterns and structures, and the self-organised diversity of the system is 
the optimal variability within these coherent patterns and structures. 

3. 1 Definitions of Self-Organisation 

Many alternative definitions have been proposed for self-organisation within 
populations and agent systems, with each defining what property or properties 
demonstrate self-organisation. So, we will now consider the most applicable 
alternatives for their suitability in defining the self-organised complexity, stability, 
and diversity of an evolving agent population. 

One possibility would be the G-machine definition of evolving populations, 
which models the emergence of organisation in pre-biotic evolutionary systems 
(Crutchfield and Gornerup, 2006). An G-machine consists of a set of causal 
states and transitions between them, with symbols of an alphabet labelling the 
transitions and consisting of two parts: an input symbol that determines which 
transition to take from a state, and an output symbol which is emitted on taking 



(HIGH organisation) (LOW organisation) 

Figure 3 Visualisation of Self-Organisation in Evolving Agent Populations: The 
number of agents, in total and of each colour, is the same in both 
populations. However, the agent population on the left intuitively shows 
organisation through the uniformity of the colours across the agent-sequences, 
whereas the population to the right shows little or no organisation. 



8 



G Briscoe and P De Wilde 



that transition (Crutchfleld and Gornerup, 2006). G-machines have several key 
properties (Crutchfleld and Young, 1989): all their recurrent states form a single, 
strongly connected component, their transitions are deterministic in the specific 
sense that a causal state with the edge symbol-pair determines the successor state, 
and an G-machine is the smallest causal representation of the transformation 
it implements. The G-machine definition of self-organisation also identifies the 
forms of complexity, stability, and diversity (Crutchfleld and Gornerup, 2006), but 
with definitions focused on pre-biotic evolutionary systems, i.e. the primordial 
soup of chemical replicators from the origin of life (Rasmussen et al., 2004). 
Complexity is defined as a form of structural-complexity, measuring the state- 
machine-based information content of the G-machine individuals of a population 
(Crutchfleld and Gornerup, 2006). Stability is defined as a meta-machine, a set 
(composition) of G-machines, that can be regarded as an autonomous and self- 
replicating entity (Crutchfleld and Gornerup, 2006). Diversity is defined, using an 
interaction network, as the variability of interaction in a population (Crutchfleld 
and Gornerup, 2006). So, while these definitions of self-organisation are compatible 
at the higher more abstract level, i.e. in the forms of self-organisation present, 
the deeper definitions of these forms are not applicable because they are context 
dependent. As we explained in the previous subsection, definitions of self- 
organisation are context dependent, and so the context of pre-biotic evolutionary 
systems, to which the G-machine self-organisation applies, is very different to the 
context of an evolving agent population from our Digital Ecosystem. Evolving 
agent populations are defined from Ecosystem- Oriented Architectures, which have 
evolutionarily surpassed the context of pre-biotic evolutionary systems, shown 
by the necessity of our consideration of the later evolutionary stage of ecological 
succession (Begon et al., 1996) (Briscoe, 2009). 

The Minimum Description Length principle (Barron et al., 1998) could be 
applied to the executable components or semantic descriptions of the agent- 
sequences of a population, with the best model, among a collection of tentatively 
suggested ones, being the one that provides the smallest stochastic complexity. 
However, the Minimum Description Length principle does not define how to select 
the family of model classes to be applied for determining the stochastic complexity 
(Hansen and Yu, 2001). This problem of model selection is well known and cannot 
be adequately formalised, and so in practise selection is based on human judgement 
and prior knowledge of the kinds of models previously chosen (Hansen and Yu, 
2001). Therefore, while models could be chosen to represent the self-organised 
complexity, and possibly even the diversity, there is no procedural method for 
determining these models, because subjective human intervention is required for 
model selection on a case-by-case basis. 

The Priig el- Bennett Shapiro formalism models the evolutionary dynamics of a 
population of sequences, using techniques from statistical mechanics and focuses 
on replica symmetry (Priigel-Bennett, 1997). The individual sequences are not 
considered directly, but in terms of the statistical properties of the population, 
using a macroscopic level of description with specific statistical properties 
to characterise the population, that are called macroscopics. A macroscopic 
formulation of an evolving population reduces the huge number of degrees of 
freedom to the dynamics of a few quantities, because a non-linear system of a few 
degrees of freedom can be readily solved or numerically iterated (Priigel-Bennett, 
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1997) . However, since a macroscopic description disregards a significant amount 
of information, subjective human insight is essential so that the appropriate 
macroscopics are chosen (Shapiro, 2001). So, while macroscopics could be chosen 
to represent the self-organised complexity, stability, and diversity, there is no 
procedural method for determining these macroscopics, because subjective human 
insight is required for macroscopic selection on a case-by-case basis. 

Kolmogorov-Chaitin complexity defines the complexity of binary sequences by 
the smallest possible Universal Turing Machine, algorithm (programme and input) 
that produces the sequence (Li and Vitanyi, 1997). A sequence is said to be regular 
if the algorithm necessary to produce it on a Universal Turing Machine is shorter 
than the sequence itself (Li and Vitanyi, 1997). A regular sequence is said to be 
compressible, whereas its compression, into the most succinct Universal Turing 
Machine possible, is said to be incompressible as it cannot be reduced any further 
in length (Li and Vitanyi, 1997). A random sequence is said to be incompressible, 
because the Universal Turing Machine to represent it cannot be shorter than the 
random sequence itself (Li and Vitanyi, 1997). This intuitively makes sense for 
algorithmic complexity, because algorithmically regular sequences require a shorter 
programme to produce them. So, when measuring a population of sequences, the 
Kolmogorov-Chaitin complexity would be the shortest Universal Turing Machine 
to produce the entire population of sequences. However, Chaitin himself has 
considered the application of Kolmogorov-Chaitin complexity to evolutionary 
systems, and realised that although Kolmogorov-Chaitin complexity represents a 
satisfactory definition of randomness in algorithmic information theory, it is not 
so useful in biology (Chaitin, 1988). For evolving agent populations the problem 
manifests itself most significantly when the agents are randomly distributed within 
the agent-sequences of the population, having maximum Kolmogorov-Chaitin 
complexity, instead of the complexity it ought to have of zero. This property makes 
Kolmogorov-Chaitin complexity unsuitable as a definition for the self-organised 
complexity of an evolving agent population. 

A definition called Physical Complexity can be estimated for a population of 
sequences, calculated from the difference between the maximal entropy of the 
population, and the actual entropy of the population when in its environment 
(Adami et al., 2000). This Physical Complexity, based on Shannon's entropy of 
information, measures the information in the population about its environment, 
and therefore is conditional on its environment. It can be estimated by counting 
the number of loci that are fixed for the sequences of a population (Adami, 

1998) . Physical Complexity would therefore be suitable as a definition of the self- 
organised complexity. However, a possible limitation is that Physical Complexity 
is currently only formulated for populations of sequences with the same length. 

Self- Organised Criticality in evolution is defined as a punctuated equilibrium 
in which the population's critical state occurs when the fitness of the individuals 
is uniform, and for which an avalanche, caused by the appearance and spread 
of advantageous mutations within the population, temporarily disrupts the 
uniformity of individual fitness across the population (Bak et al., 1988). Whether 
an evolutionary process displays Self-Organised Criticality remains unclear. There 
are those who claim that Self-Organised Criticality is demonstrated by the 
available fossil data (Sneppen et al., 1995), with a power law distribution on the 
lifetimes of genera drawn from fossil records, and by artificial life simulations 
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(Adami, 1995), again with a power law distribution on the lifetimes of competing 
species. However, there are those who feel that the fossil data is inconclusive, 
and that the artificial life simulations do not show Self-Organised Criticality, 
because the key power law behaviour in both can be generated by models without 
Self-Organised Criticality (Newman, 1996). Also, the Self-Organised Criticality 
does not define the resulting self-organised stability of the population, only the 
organisation of the events (avalanches) that occur in the population over time. 

Evolutionary Game Theory (Weibull, 1995) is the application of models inspired 
from population genetics to the area of game theory, which differs from classical 
game theory (Fudenberg and Tirole, 1991) by focusing on the dynamics of strategy 
change more than the properties of individual strategies. In Evolutionary Game 
Theory, agents of a population play a game, but instead of optimising over 
strategic alternatives, they inherit a fixed strategy and then replicate depending 
on the strategy's payoff (fitness) (Weibull, 1995). The self-organisation found in 
Evolutionary Game Theory is the presence of stable steady states, in which the 
genotype frequencies of the population cease to change over the generations. This 
equilibrium is reached when all the strategies have the same expected payoff, 
and is called a stable steady state, because a slight perturbing will not cause 
a move far from the state. An evolutionary stable strategy leads to a stronger 
asymptotically stable state, as a slight perturbing causes only a temporary move 
away from the state before returning (Weibull, 1995). So, Evolutionary Game 
Theory is focused on genetic stability between competing between individuals, 
rather than the stability of the population as a whole, which therefore limits its 
suitability for the self-organised stability of an evolving agent population. 

Multi-Agent Systems are the dominant computational technology in the 
evolving agent populations, and while there are several definitions of self- 
organisation (Parunak and Brueckner, 2001; Mamei and Zambonelli, 2003; 
Tianfield, 2005; Di Marzo Serugendo et al., 2006) and stability (Moreau, 2005; 
Weiss, 1999; Olfati-Saber et al., 2007) defined for Multi- Agent Systems, they 
are not applicable primarily because of the evolutionary dynamics inherent in 
the context of evolving agent populations. Whereas Chli-De Wilde stability of 
Multi- Agent Systems (Chli et al., 2003) may be suitable, because it models Multi- 
Agent Systems as Markov chains, which are an established modelling approach in 
evolutionary computing (Rudolph, 1998). A Multi- Agent System is viewed as a 
discrete time Markov chain with potentially unknown transition probabilities, in 
which the agents are modelled as Markov processes, and is considered to be stable 
when its state has converged to an equilibrium distribution (Chli et al., 2003). 
Chli-DeWilde stability provides a strong notion of self-organised stability over 
time, but a possible limitation is that its current formulation does not support the 
necessary evolutionary dynamics. 

The main concept in Mean Field Theory is that for any single particle the most 
important contribution to its interactions comes from its neighbouring particles 
(Parisi, 1998). Therefore, a particle's behaviour can be approximated by relying 
upon the mean field created by its neighbouring particles (Parisi, 1998), and 
so Mean Field Theory could be suitable as a definition for the self-organised 
diversity of an evolving agent population. Naturally, it requires a neighbourhood 
model to define interaction between neighbours (Parisi, 1998), and is therefore 
easily applied to domains such as Cellular Automata (Gutowitz et al., 1987). 
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While a neighbourhood model is feasible for biological populations (Flyvbjerg 
et al., 1993), evolving agent populations lack such neighbourhood models based 
on a 2D or 3D metric space, with the only available neighbourhood model being 
a distance measure on a parameter space measuring dissimilarity. However, this 
type of neighbourhood model cannot represent the information-based interactions 
between the individuals of an evolving agent population, making Mean Field 
Theory unsuitable as a definition for the self-organised diversity of an evolving 
agent population. 



4 Complexity 

A definition for the self-organised complexity of an evolving agent population 
should define the creation of coherent patterns and structures from the agents 
within, with no initial constraints from modelling approaches for the inclusion of 
pre-defined specific behaviour, but capable of representing the appearance of such 
behaviour should it occur. 

None of the proposed definitions are directly applicable for the self-organised 
complexity of an evolving agent population. The G-machine modelling (Crutchfield 
and Gornerup, 2006) is not applicable, because it is only defined within the context 
of pre-biotic populations. Neither is the Minimum Description Length principle 
(Barron et al., 1998) or the Priigel-Bennett Shapiro formalism (Priigel-Bennett, 
1997), because they require the involvement of subjective human judgement at the 
critical stage of model and quantifier selection (Hansen and Yu, 2001; Shapiro, 
2001). Kolmogorov-Chaitin complexity (Chaitin, 1988) is also not applicable as 
randomness is given maximum complexity. 

Physical Complexity (Adami et al., 2000) fulfils abstractly the required 
definition for the self-organised complexity of an evolving agent population, 
estimating complexity based upon the individuals of a population within the 
context of their environment. However, its current formulation is problematic, 
primarily because it is only defined for populations of fixed length, but as this 
is not a fundamental property of its definition (Adami et al., 2000) it should be 
feasible to redefine and extend it as needed. So, the use of Physical Complexity as 
a definition for the self-organised complexity of evolving agent populations will be 
investigated further to determine its suitability. 

4-1 Physical Complexity 

Understanding DNA requires knowing the environment (context) in which it exists, 
which may initially appear obvious as DNA is considered to be the language of 
life and the purpose of life is to procreate or replicate (Dawkins, 2006). Virtually 
all activities of biological life- forms are towards this aim (Dawkins, 2006), with 
a few exceptions (e.g. suicide before procreation), and to achieve replication 
requires resources, energy and matter to be harvested. So, for any individual 
the environment represents the problem of extracting energy for replication, and 
so their DNA sequence represents a solution to this problem. Even with this 
understanding it would seem we still need to define the environment to be able to 
distinguish the information from the redundancy in a solution (DNA sequence). 
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Physical Complexity was born (Adami, 1998) from the need to determine 
the proportion of information in sequences of DNA, because it has long been 
established that the information contained is not directly proportional to the 
length, known as the C-value enigma/paradox (Thomas Jr, 1971). However, 
because Physical Complexity analyses an ensemble of DNA sequences, the 
consistency between the different solutions shows the information, and the 
differences the redundancy (Adami, 2003). Entropy, a measure of disorder, 
is used to determine the redundancy from the information in the ensemble. 
Physical Complexity therefore provides a context-relative definition for the self- 
organisation of a population without needing to define the context (environment) 
explicitly (Adami and Cerf, 2000). Furthermore, an individual DNA solution is 
not necessarily a simple inverse of the problem that the environment represents, 
with forms of life having evolved specialised, specific and effective ways (niches) 
to acquire the necessary energy and matter for replication. 

Physical Complexity was derived (Adami and Cerf, 2000) from the notion of 
conditional complexity defined by Kolmogorov, which is different from traditional 
Kolmogorov complexity and states that the determination of complexity of a 
sequence is conditional on the environment in which the sequence is interpreted 
(Li and Vitanyi, 1997). So, the complexity of a population S, of sequences s, 

£ 

C = £-J2H(i), (1) 

1=1 

is the maximal entropy of the population (equivalent to the length of the 
sequences) £, minus the sum, over the length £, of the per-site entropies H(i), 

H ( i ) = ~ Pd ^ log \D\ Pd{i), (2) 
deD 

where i is a site in the sequences ranging between one and the length of the 

sequences D is the alphabet of characters found in the sequences, and pd(i) is the 

probability that site i (in the sequences) takes on character d from the alphabet D, 

with the sum of the Pd(i) probabilities for each site i equalling one, Pd(i) — 1 

deD 

(Adami and Cerf, 2000). So, the equivalence of the maximum complexity to the 
length matches the intuitive understanding that if a population of sequences of 
length £ has no redundancy, then their complexity is their length £. Taking the log 
to the base \D\ conveniently normalises H(i) to range between zero and one. 

If G represents the set of all possible genotypes constructed from an alphabet 
D that are of length £, then the size (cardinality) of \G\ is equal to the size of the 
alphabet \D\ raised to the length £, 

\G\ = \Df. (3) 

For the complexity measure to be accurate, a sample size of \D\ £ is suggested 
to minimise the error (Adami and Cerf, 2000), but such a large quantity can 
be computationally infeasible. The definition's creator, for practical applications, 
chooses a population size of \D\£, sufficient to show any trends present. So, 
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for a population of sequences S we choose, with the definition's creator, a 
computationally feasible population size of \D\ times 



The size of the alphabet, \D\, depends on the domain to which Physical 
Complexity is applied. For RNA the alphabet is the four nucleotides, D = 
{A,C,G,U}, and therefore \D\ = 4 (Adami and Cerf, 2000). When Physical 
Complexity was applied to the Avida simulation software, there was an alphabet 
size of twenty-eight, \D\ =28, as that was the size of the instruction set for the 
self-replicating programmes (Adami et al., 2000). 

4-2 Variable Length Sequences 

Physical Complexity is currently formulated for a population of sequences of the 
same length (Adami and Cerf, 2000), and so we will now investigate an extension 
to include populations of variable length sequences, which will include populations 
of variable length agent-sequences of our Digital Ecosystem. This requires changing 
and re-justifying the fundamental assumptions, specifically the conditions and 
limits upon which Physical Complexity operates. In (1) the Physical Complexity, 
C, is defined for a population of sequences of length t (Adami and Cerf, 2000). 
The most important question is what does the length i equal if the population 
of sequences is of variable length? The issue is what i represents, which is the 
maximum possible complexity for the population (Adami and Cerf, 2000), which 

will be called the complexity potential Cp. The maximum complexity in (1) occurs 

t 

when the per-site entropies sum to zero, H(i) — » 0, as there is no randomness 

i=i 

in the sites (all contain information), i.e. C — >• t (Adami and Cerf, 2000). So, the 
complexity potential equals the length, 



provided the population S is of sufficient size for accurate calculations, as found 
in (4), i.e. I^l is equal or greater than \D\£. For a population of variable length 
sequences, Sy, the complexity potential, CV P , cannot be equivalent to the length 
because it does not exist. However, given the concept of a minimum sample 
size from (4), there is a length for a population of variable length sequences, 
£v, between the minimum and maximum length, such that the number of per- 
site samples up to and including iy is sufficient for the per-site entropies to 
be calculated. So the complexity potential for a population of variable length 
sequences, Cy p , will be equivalent to its calculable length, 



If iy where to be equal to the length of the longest individual(s) £ m ax in a 
population of variable length sequences Sy, then the operational problem is that 
for some of the later sites, between one and imax •> 

the sample size will be less 
than the population size \Sy\. So, having the length ty equal the maximum length 



\S\ > \D\t 



(4) 



C P = £, 



(5) 



Cy P = £y. 



(6) 
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would be incorrect, as there would be an insufficient number of samples at the later 
sites, and therefore ty ^ t max . So, the length for a population of variable length 
sequences, iy, is the highest value within the range of the minimum (one) and 
maximum length, 1 < ty < £ max , for which there are sufficient samples to calculate 
the entropy. A function which provides the sample size at a given site is required 
to specify the value of ty precisely, 

sampleSize(i : site) : int, (7) 

where the output varies between 1 and the population size \Sy\ (inclusive). 
Therefore, the length of a population of variable length sequences, iy, is the 
highest value within the range of one and the maximum length, for which the 
sample size is greater than or equal to the alphabet size multiplied by the length 



where ty is the length for a population of variable length sequences, and t max 
is the maximum length in a population of variable length sequences, ty varies 
between 1 < ty < t max , D is the alphabet and \D\ > 0. This definition intrinsically 
includes a minimum size for populations of variable length sequences, \D\ty, and 
therefore is the counterpart of (4), which is the minimum population size for 
populations of fixed length. 

The length t used in the limits of (2) no longer exists, and therefore (2) must 
be updated; so, the per-site entropy calculation for variable length sequences will 
be denoted by Hy(i), and is, 



where D is still the alphabet, ty is the length for a population of variable 
length sequences, with the site i now ranging between 1 < i < ty, while the Pd(i) 
probabilities still range between < p^(i) < 1, and still sum to one. It remains 
almost algebraically identical to (2), but the conditions and constraints of its use 
will change, specifically t is replaced by ty. Naturally, Hy(i) ranges between zero 
and one, as did H(i) in (2). So, when the entropy is maximum the character found 
in the site i is uniformly random, holding no information. 

Therefore, the complexity for a population of variable length sequences, Cy, is 
the complexity potential of the population of variable length sequences minus the 
sum, over the length of the population of variable length sequences, of the per-site 
entropies (9), 



sampleSize(ty) > \D\ty A sampleSize(ty + 1) < \D\ty, 



(8) 




(9) 



deD 




(10) 



i=l 



where ty is the length for the population of variable length sequences, and Hy(i) 
is the entropy for a site i in the population of variable length sequences. 



Self- Organisation of Evolving Agent Populations in Digital Ecosystems 



15 



4-3 Efficiency 

Physical Complexity can now be applied to populations of variable length 
sequences, and so we will now consider the abstract example populations in 
Figure 4. We will let a single square, □ , represent a site i in the sequences, 
with different colours to represent the different values. Therefore, a sequence of 
sites will be represented by a sequence of coloured squares, I I I I . Furthermore, 
the alphabet D is the set {□,□,□}, the maximum length imax is 6 and the 
length for populations of variable length sequences ty is calculated from (8) as 
5. The Physical Complexity values in Figure 4 are consistent with the intuitive 
understanding one would have for the self-organisation of the sample populations; 
the population with high Physical Complexity has a little randomness, while the 
population with low Physical Complexity is almost entirely random. 

Using our extended Physical Complexity we can construct a measure showing 
the use of the information space, called the Efficiency E, which is calculated by 
the Physical Complexity Cy over the complexity potential Cy p1 

E=£- (11) 

^V P 

The Efficiency E will range between zero and one, reaching its maximum when the 
actual complexity Cy equals the complexity potential CV P , indicating that there 
is no randomness in the population. In Figure 4 the populations of sequences are 
shown with their respective Efficiency values as percentages, and the values are as 
one would expect. 

The complexity Cy (10) is an absolute measure, whereas the Efficiency E (11) 
is a relative measure based on the complexity Cy. So, the Efficiency E can be 
used to compare the self-organised complexity of populations, independent of their 
size, their length, and whether their lengths are variable or not (as it is equally 
applicable to the fixed length populations of the original Physical Complexity). 

4-4 Clustering 

The self- organised complexity of an evolving agent population is the clustering, 
amassing of same or similar sequences, around the optimum genome (Begon et al., 
1996). This can be visualised on a fitness landscape (Wright, 1932), which shows 
the combination space (power set) of the alphabet D against the fitness values 
from the selection pressure (user request). The agent-sequences of an evolving 



C v = 4.420, %E = 88.4 C v = 0.575, %E = 11.5 

Figure 4 Abstract Visualisation for populations of Variable Length Sequences: The 
Physical Complexity and Efficiency values are consistent with the intuitive 
understanding one would have for the self-organisation. 
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population will evolve, clustering around the optimal genome, assuming that its 
evolutionary process does not become trapped while clustering over local optima, 
and as shown in Figure 5. 




Figure 5 3D Fitness Landscape with a Global Optimum: This shows the combination 
space (power set) of the alphabet D against the fitness values from the 
selection pressure (user request), resulting in a global optimum. 



Clustering is indicated by the Efficiency E tending to its maximum, as the 
population's Physical Complexity Cy tends to the complexity potential Cy p , 
because an optimal sequence is becoming dominant in the population, and 
therefore increasing the uniformity of the sites across the population. With a 
global optimum, the Efficiency E tends to a maximum of one, indicating that the 
evolving population of sequences is tending to a set of clusters T of size one, 

E=^^las\T\^l, (12) 

^V P 

assuming its evolutionary process does not become trapped at local optima. So, 
the tending of the Efficiency E provides a clustering coefficient. It tends, never 
quite reaching its maximum, because of the mutation inherent in the evolutionary 
process. 

The other extreme scenario occurs when the number of clusters equals the size 
of the population, which would only occur with a flat fitness landscape (Kimura, 
1983) resulting from a non-discriminating selection pressure, as shown in Figure 
6. The population occupancy is uniformly random, as any position (sequence) has 
the same fitness as any other. So the entropy (randomness) tends to maximum, 
resulting in the complexity Cy tending to zero, and therefore the Efficiency E also 
tending to zero, while the number of clusters \T\ tends to the number of sequences 
in the population l^l, 



E = -> as \T\ -> 



(13) 
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Figure 6 3D Fitness Landscape with No Optimum: Theoretical extreme scenario in 
which the selection pressure is non-discriminating. So, the population 
occupancy of the fitness landscape would then be uniformly random. 



So the number of clusters |T| tends to the population size \S\, with each cluster 
consisting of only one unique sequence (individual). 

If there are global optima, as there are in Figure 7, the Efficiency E will tend 
to a maximum below one, because the population of sequences consists of more 
than one cluster, with each having an Efficiency tending to a maximum of one. 
The simplest scenario of clusters is pure clusters] pure meaning that each cluster 
uses a distinct (mutually exclusive) subset of the alphabet D relative to any other 
cluster. In this scenario the Efficiency E tends to a value based on the number 
of clusters |T|, because a number of the Pd(i) probabilities at each site in (9) are 
the reciprocal of the number of clusters, So, given that the number of the 

Pd(i) probabilities taking the value 4^ is equal to the number of clusters, while the 




Figure 7 3D Fitness Landscape with Global Optima: Clustering scenario, in which 

the Efficiency E of the population S tends to a value based on the number of 
clusters |T|, because of clustering around multiple optima. 
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other Pd(i) probabilities take a value of zero, then the per-site entropy calculation 
of Hy{i) from (9) becomes 



H v {i) = \og m \T 



(14) 



where i is the site, \D\ is the alphabet size, and \T\ is the number of clusters. 
Hence, given (14), (10), and (6), then the Efficiency E from (11) becomes 



E 1 - (log p| \T\) 



(15) 



where \D\ is the alphabet size and \T\ is the number of clusters. Therefore, the 
Efficiency E, the clustering coefficient, tends to a value that can be used to 
determine the number of pure clusters in an evolving population of sequences. 

For a population S with clusters, each cluster is a sub-population with an 
Efficiency E tending to a maximum of one. To specify this relationship we require 
a function that provides the Efficiency E (11) of a population or sub-population 
of sequences, 



efficiency (input .-population) :int. 



(16) 



So, for a population S consisting of a set of clusters T, each member (cluster) t is 
therefore a sub-population of the population 5, and is defined as 



teT 

t C S A efficiency(t) -> 1 A |t| 



\s\ 



■ £i*i = isi), 

ter J 



(17) 



where a cluster t has an Efficiency E tending to a maximum of one, and the cluster 
size \t\ is approximately equal to the population size \S\ divided by the number of 
clusters |T|. It is only approximately equal because of variation from mutation, and 
because the population size may not divide to a whole number. These conditions 
are true for all members t of the set of clusters T, and therefore the summation of 
the cluster sizes \t\ equals the size of the population 1*51. 

The population of sequences from the fitness landscape of Figure 7 is visualised 
in Figure 8, but the clusters within cannot be seen. So, the population is arranged 
to show the clustering in Figure 9, in which the two clusters are clearly evident. 
The clusters of the population have Efficiency values tending to a maximum of 
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C v = 1.107, C Vp =£ v = 3, %E = 36.9 
D = alphabet = {□,□,□} 

Figure 8 Population with Hidden Clusters: Visualisation for the population of 

sequences from a population with global optima, with clusters visually hard 
to identify. 
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Cluster 1 C Vp =£ v = 3 Cluster 2 
C v = 2.704, E = 0.901 C v = 3, E = 1 

Figure 9 Population with Clusters Visible: Visualisation for a population of sequences 
with global optima, which has been arranged to show the clusters present. 



one, compared to the Efficiency of the population as a whole, which is tending to 
a maximum significantly below one. This is the expected behaviour of clusters as 
defined in (18). 

The population size \S\, in Figure 9, is double the minimum requirement 
specified in (8), so that the complexity Cy (10) and Efficiency E (11) could be used 
in defining the principles of clustering without redefining the length of a population 
of variable length sequences ty (8). However, when determining the variable length 
ty of a cluster t, the sample size requirement is different, specifically a cluster t is 
a sub-population of 5, and therefore by definition cannot have a population size 
equivalent to S (unless the population consists of only one cluster). Therefore, to 
manage clusters requires a reformulation of ty (8) to 

sample Size(ty) « A 

f -i,« mm , + i)<V' <18) 

where t max is the maximum length in a population of variable length sequences, 
ty varies between 1 < ty < t max , D is the alphabet, \D\ > 0, and T is the set of 
clusters in the population S. 

A population with clusters will always have an Efficiency E tending towards a 
maximum significantly below one. Therefore, managing populations with clusters 
requires a reformulation of the Efficiency (11) to 



E C (S) 



ifm = i 

£ E c (t) ) ( 19 ) 

i^-a\T\>\ 



where t is a cluster, and a member of the set of clusters T of the population S. 
So, the Efficiency E c is equivalent to the Efficiency E if the population consists of 
only one cluster, but if there are clusters then the Efficiency E c is the average of 
the Efficiency E values of the clusters. 



4-5 Atomicity 

Atomicity is the property of a set of agents, such that no single agent can 
functionally replace any agent-sequence, i.e. their functionality is mutually 
exclusive to one another. It is important because non- atomicity can adversely 
affect the uniformity of the calculated per-site entropies, which is the main 
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construct of the Physical Complexity measure, and so non-atomicity risks 
introducing error when calculating the information content. Our extensions to 
Physical Complexity to support clustering are also necessary to manage non- 
atomicity, because it leads to the formation of clusters within evolving agent 
populations. The presence of clusters can be identified by the clustering coefficient, 
the Efficiency E tending to a value below one, with the Efficiency E c (19) 
used to calculate the actual Efficiency as it supports clustering and therefore 
non-atomicity. 

If we consider the example population shown in Figure 10, which is constructed 
from an alphabet in which the yellow agent □ can functionally replace a green blue 
agent- sequence □□, and so the uniformity across site two is lost. Therefore, the 
Efficiency E of the population is a half, whereas the Efficiency E c for populations 
with clusters is one, because it supports clustering and therefore non-atomicity. 

£ = 0.5 C v = l,C Vp =2 E c = l 



site 2 = [ 



Cluster 1 Cluster 2 

Cy = 3, Cy p = 3 Cy = 2, Cy p — 2 

E = 1 E = 1 

Figure 10 Population Constructed from a Non- Atomic Alphabet: The population is 
constructed from an alphabet in which the yellow agent is able to functionally 
replace a green blue agent-sequence. 



5 Stability 

A definition for the self-organised stability of an evolving agent population should 
define the resulting stability or instability that emerges over time, with no initial 
constraints from modelling approaches for the inclusion of pre-defined specific 
behaviour, but capable of representing the appearance of such behaviour should it 
occur. 

None of the proposed definitions are directly applicable for the self-organised 
stability of an evolving agent population. The G-machine modelling (Crutchfield 
and Gornerup, 2006) is not applicable, because it is only defined within the 
context of pre-biotic populations. The Priigel-Bennett Shapiro formalism (Priigel- 
Bennett, 1997) is not suitable, because it necessitates the involvement of subjective 
human judgement at the critical stage of quantifier selection. Self-Organised 
Criticality (Bak et al., 1988) is also not applicable as it only models the events of 
genetic change in the population over time, rather than measuring the resulting 
stability or instability of the population. Neither is Evolutionary Game Theory 
(Weibull, 1995), which only defines the genetic stability of the genotypes, in 
terms of equilibrium and non- equilibrium dynamics, instead of the stability of the 
population as a whole. 

Chli-DeWilde stability of Multi- Agent Systems (Chli et al., 2003) does fulfil 
the required definition of the self-organised stability, measuring convergence to 
an equilibrium distribution. However, its current formulation does not include 
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Multi-Agent Systems that make use of evolutionary computing algorithms, i.e. 
our evolving agent populations, but it could be extended to include such Multi- 
Agent Systems, because its Markov-based modelling approach is well established 
in evolutionary computing (Rudolph, 1998). While there has been past work on 
modelling evolutionary computing algorithms as Markov chains (Rudolph, 1994; 
Nix and Vose, 1992; Goldberg and Segrest, 1987; Eiben et al., 1991), we have found 
none including Multi-Agent Systems despite both being mature research areas 
(Nwana, 1996; Marrow, 2000), because their integration is a recent development 
(Smith and Taylor, 1998). So, the use of Chli-DeWilde stability as a definition 
for the self-organised stability of evolving agent populations will be investigated 
further to determine its suitability. 

5.1 Chli-DeWilde Agent Stability 

We will now briefly introduce Chli-DeWilde stability for Multi- Agent System and 
Evolutionary Computing, sufficiently to allow for the derivation of our extensions 
to Chli-DeWilde stability to include Multi-Agent Systems with Evolutionary 
Computing. Chil-DeWilde stability was created to provide a clear notion of 
stability in MASs (Chli et al., 2003), because stability is perhaps one of the 
most desirable features of any engineered system, given the importance of being 
able to predict its response to various environmental conditions prior to actual 
deployment; and while computer scientists often talk about stable or unstable 
systems (Thomas and Sycara, 1998; Balakrishnan et al., 1997), they did so without 
having a concrete or uniform definition of stability. Also, other properties had been 
widely investigated, such as openness (Abramov et al., 2001), scalability (Marwala 
et al., 2001) and adaptability (Simoes-Marques et al., 2003), but not stability. So, 
the Chli-DeWilde definition of stability for MASs was created (Chli et al., 2003), 
based on the stationary distribution of a stochastic system, modelling the agents 
as Markov processes, and therefore viewing a MAS as a discrete time Markov chain 
with a potentially unknown transition probability distribution. The MAS is stable 
once its state, a stochastic process, has converged to an equilibrium distribution 
(Chli et al., 2003), because stability of a system can be understood intuitively as 
exhibiting bounded behaviour. 

Chli-DeWilde stability was derived (Chli, 2006) from the notion of stability 
defined by De Wilde (De Wilde et al., 1999; Lee et al., 1998), based on the 
stationary distribution of a stochastic system, making use of discrete-time Markov 
chains, which we will now introduce. If we let / be a countable set, such that each 
i G / is called a state and / is called the state-space. We can then say that A = (A^ : 
i G /) is a measure on I if < A^ < oo for all i G /, and additionally a distribution 
if ^2 ieI Ai = 1 (Chli, 2006). So, if X is a random variable taking values in / and we 
have \ = Pr(X = z), then A is the distribution of X, we can say that a matrix P = 
(Pij hj £ I) is stochastic if every row (pij : j G /) is a distribution (Chli, 2006). 
We can then extend familiar notions of matrix and vector multiplication to cover 
a general index set / of potentially infinite size, by defining the multiplication of a 
matrix by a measure as AP, which is given by 




(20) 
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We can now describe the rules for a Markov chain by a definition in terms of the 
corresponding matrix P (Chli, 2006). 

Definition 5.1: We say that (X t ) t >o is a Markov chain with initial distribution 
A = (Xi : i G /) and transition matrix P = (pij : i,j G I) if: 

1. Pr(X° = i ) = X io and 

2. PrpT^ 1 = i t+1 | X° = i , . . . , X* = i t ) = p HH+1 . 

We abbreviate these two conditions by saying that (X t ) t >o is Markov(X, P). 

In this first definition the Markov process is memoryless 1 , resulting in only the 
current state of the system being required to describe its subsequent behaviour. 
So, we say that a Markov process X ,^ 1 , . . . ,X £ has a stationary distribution if 
the probability distribution of X 1 becomes independent of the time t (Chli et al., 
2003). Therefore, the following theorem is an easy consequence of the second 
condition from the first definition. 



Theorem 5.2: A discrete- time random process (X t ) t >o is Markov(X, P), if and 
only if for all t and zq, . . . , it we have 

Pr(X° = i ,...,X t =i t ) = X ioPioil •••p it _ lit . (21) 

This first theorem depicts the structure of a Markov chain, (Chli, 2006; Norris, 
1997; Cox and Miller, 1977), illustrating the relation with the stochastic matrix 
P. The next Theorem shows how the Markov chain evolves in time, again showing 
the role of the matrix P. 



Theorem 5.3: Let (X £ ) £ >o be Markov(X, P), then for all t,s > 0; 

1. Pr(X £ = j) = (XP^j and 

2. Pr(X £ =j\X°=i) = Pr(X £+s = j \ X s = i) = (P%. 
For convenience (P l )ij can be more conveniently denoted as pf) . 

Given this second theorem we can define pf) as the t-step transition probability 
from the state i to j (Chli, 2006), so we can now introduce the concept of an 
invariant distribution (Chli, 2006), in which we say that A is invariant if 

XP = A. (22) 

The next theorem will link the existence of an invariant distribution, which 
is an algebraic property of the matrix P, with the probabilistic concept of an 
equilibrium distribution. This only applies to a restricted class of Markov chains, 
namely those with irreducible and aperiodic stochastic matrices. However, there 
is a multitude of analogous results for other types of Markov chains to which 
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we can refer (Norris, 1997; Cox and Miller, 1977), and the following theorem 
is provided as an exmaple of the family of theorems that apply. An irreducible 
matrix P is one for which, for all i,j El there are sufficiently large t,pfj > 0, 

and is aperiodic if for all states i G / we have p[f > for all sufficiently large t 
(Chli, 2006; Norris, 1997; Cox and Miller, 1977). The meaning of these properties 
can broadly be explained as follows. An irreducible Markov chain is a chain where 
all states intercommunicate. For this to happen, there needs to be a non-zero 
probability to go from any state to any other state. This communication can 
happen in any number t of time steps. This leads to the condition pf) > for 
all i and j. An aperiodic Markov chain is a chain where all states are aperiodic. 
A state is aperiodic if it is not periodic. Finally, a state is periodic if subsequent 
occupations of this state occur at regular multiples of a time interval. For this to 
happen, p[f has to be zero for t an integral multiple of a number. This leads to 
the condition p[f > for a-periodicity. For further explanations, please refer to 
(Cox and Miller, 1977). 



Theorem 5.4: Let P be irreducible, aperiodic and have an invariant distribution, 
A can be any distribution, and suppose that (X £ )t>o is Markov(X^P) (Chli, 2006), 
then 

Pr(X £ = j) -+ pf as t -+ oc for all j el (23) 
and 

Pij ~^Pj° ast ^°° f° r aU h3 e L ( 24 ) 



We can now view a system S as a countable set of states / with implicitly 
defined transitions P between them, such that at time t the state of the system 
is the random variable X £ , with the key assumption that (X £ )^o is Markov(A,P) 
(Chli, 2006; Norris, 1997; Cox and Miller, 1977). 



Definition 5.5: The system S is said to be stable when the distribution of the its 
states converge to an equilibrium distribution, 

Pr(X £ = j) pf as t -+ oc for all 3 G /. (25) 
More intuitively, the system 5, a stochastic process X ,^ 1 ,^ 2 ,... is stable if 
the probability distribution of X 1 becomes independent of the time index t for 
large t (Chli et al., 2003). Most Markov chains with a finite state-space and 
positive transition probabilities are examples of stable systems, because after an 
initialisation period they stabalise on a stationary distribution (Chli, 2006). 

A MAS can be viewed as a system 5, with the system state represented by a 
finite vector X, having dimensions large enough to manage the agents present in 
the system. The state vector will consist of one or more elements for each agent, 
and a number of elements to define general properties 2 of the system state. Hence 
there are many more states of the system (different state vectors) than there are 
agents. 
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5.2 Extensions for Evolving Populations 



Having now introduced Chli-DeWilde stability, we will now consider the 
Evolutionary Computing of our Digital Ecosystems in greater detail, sufficiently 
to allow for the derivation of our extensions to Chli-DeWilde stability to include 
our Digital Ecosystems. 

This Evolutionary Computing is now recognised as a sub-field of artificial 
intelligence (more particularly computational intelligence) that involves 
combinatorial optimisation problems (Baeck et al., 1997). Evolutionary algorithms 
are based upon several fundamental principles from biological evolution, including 
reproduction, mutation, recombination (crossover), natural selection, and survival 
of the fittest. As in biological populations, evolution occurs by the repeated 
application of the above operators (Back, 1996). An evolutionary algorithm 
operates on the collection of individuals making up a population. An individual, in 
the natural world, is an organism with an associated fitness (Lawrence, 2005). So, 
candidate solutions to an optimisation problem play the role of individuals in a 
population, and a cost (fitness) function determines the environment within which 
the solutions live, analogous to the way the environment selects for the fittest 
individuals. The number of individuals varies between different implementations 
and may also vary during the use of an evolutionary algorithm. Each individual 
possesses some characteristics that are defined through its genotype, its genetic 
composition, which will be passed onto the descendants of that individual (Back, 
1996). Processes of mutation (small random changes) and crossover (generation 
of a new genotype by the combination of components from two individuals) may 
occur, resulting in new individuals with genotypes differing from the ancestors 
they will come to replace. These processes iterate, modifying the characteristics 
of the population (Back, 1996). Which members of the population are kept, 
and used as parents for offspring, depends on the fitness (cost) function of the 
population. This enables improvement to occur (Back, 1996), and corresponds to 
the fitness of an organism in the natural world (Lawrence, 2005). Recombination 
and mutation create the necessary diversity and thereby facilitate novelty, while 
selection acts as a force increasing quality. Changed pieces of information resulting 
from recombination and mutation are randomly chosen. Selection operators can 
be either deterministic, or stochastic. In the latter case, individuals with a higher 
fitness have a higher chance to be selected than individuals with a lower fitness 
(Back, 1996). 

So, extending Chli-DeWilde stability to the class of MASs that make use 
of evolutionary computing algorithms, including our evolving agent populations, 
requires consideration of the following issues: the inclusion of population dynamics, 
and an understanding of population macro-states. 

5.2.1 Population Dynamics 

First, the MAS of an evolving agent population is composed of n agent- sequences, 
with each agent-sequence iina state £| at time t, where i = 1, 2, . . . , n. The states 
of the agent-sequences are random variables, so that the state vector for the MAS 
is a vector of random variables with the time being discrete, t = 0, 1, . . . . The 
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interactions among the agent- sequences are noisy, and are given by the probability 
distributions 

Pr(X i |Y) = Pr(4* +1 =X i \£=Y), l,...,n, (26) 

where X{ is a value for the state of agent-sequence z, and Y is a value for the state 
vector of the MAS. The probabilities implement a Markov process, with the noise 
caused by mutations. Furthermore, the agent-sequences are each subjected to the 
selection pressure from the environment of the system, which is applied equally 
to all the agent-sequences of the population. So, the probability distributions are 
statistically independent, and 

Pr(X|Y) = n? =1 Prfe t+1 = Xi\? = Y). (27) 

If the occupation probability of state X at time t is denoted by p 1 ^ then 



^Pr(XlY)^" 1 . (28) 



This is a discrete time equation used to calculate the evolution of the state 
occupation probabilities from t = 0, while equation (27) is the probability of 
moving from one state to another. The MAS (evolving agent population) is self- 
stabilising if the limit distribution, of the occupation probabilities, exists and is 
non-uniform, i.e. 

p% = limt^oop^ (29) 
exists for all states X, and there exist states X and Y such that 

P%^P%- (30) 

These equations define that some configurations of the system, after an extended 
time, will be more likely than others, because the likelihood of their occurrence no 
longer changes. Such a system is stable, because the occurrence of states no longer 
changes with time, and is the definition of stability developed in (Chli et al., 2003). 
While equation (29) is the probabilistic equivalence of an attractor 3 in a system 
with deterministic interactions, which we had to extend to a stochastic process 
because mutation is inherent in evolutionary dynamics. 

While the number of agents in the Chli-DeWilde formalism varies, we require 
variation according to the selection pressure acting upon the evolving agent 
population. We must therefore formally define and extend the definition of dead 
agents, by introducing a new state d for each agent-sequence. If an agent-sequence 
is in this state, £| = d, then it is dead and does not affect the state of other agent- 
sequences in the population. If an agent-sequence i has low fitness then that agent- 
sequence will likely die, because 

Pr(d|Y) = Pr(£< +1 = d\e = Y) (31) 

will be high for all Y. Conversely, if an agent-sequence has high fitness, it will likely 
replicate and assume the state of a similarly successful agent-sequence (mutant), 
or crossover might occur changing the state of the successful agent-sequence and 
another agent-sequence. 
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5.2.2 Population Macro-States 

The state of the system, an evolving agent population, S is determined by the 
collection of agents of which it consists at a specific time t, which potentially 
changes as the time increases, t+1. This collection of agents will have varying 
fitness values, and so the one with the highest fitness at the current time t is the 
current maximum fitness individual. For example, an evolving agent population 
with individuals ranging in fitness between 36.2% and and 45.8%, the current 
maximum fitness individual (agent) is the one with a fitness of 45.8%. So, we 
can define a macro-state M as a set of states (evolving agent populations) with 
a common property, here possessing at least one copy of the current maximum 
fitness individual. Therefore, by its definition, each macro-state M must also 
have a maximal state composed entirely of copies of the current maximum fitness 
individual. If the population size is not fixed (not in nature, can be in evolutionary 
computing), the state space of the evolving agent population is infinite, but in 
practise would be bounded by resource availability. So, there is also an infinite 
number of configurations for an evolving agent population that has the same 
current maximum fitness individual. 

So, the state-space I of the system (evolving agent population) S can be 
grouped to a set macro-states {M}. For one macro- state, which we will call the 
maximum macro-state M maxi the current maximum fitness individual will be the 
global maximum fitness individual, which is the optimal solution {fittest individual) 
that the evolutionary computing process can reach through the evolving agent 
population (system) S. For example, an evolving agent population at its maximum 
macro- state M max , with individuals ranging in fitness between 88.8% and and 
96.8%, the global maximum fitness individual (agent) is the one with a fitness of 
96.8% and there will be no fitter agent. Also, we can therefore refer to all other 
macro-states of the system S as sub-optimal macro-states, as there can be only 
one maximum macro-state M max . 



maximum macro-state M n 




Figure 11 State-Space of an Evolving Agent Population: A possible evolutionary 
path through the state-space I is shown, with the selection pressure of the 
evolutionary process driving it towards the maximal state of the maximum 
macro-state M macc . 
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We can consider the macro-states of an evolving agent population visually 
through the representation of the state-space / of the system S shown in Figure 11, 
which includes a possible evolutionary path through the state-space /. Traversal 
through the state-space / is directed by the selection pressure of the evolutionary 
process acting upon the population S, driving it towards the maximal state of 
the maximum macro-state M maxi which consists entirely of copies of the optimal 
solution, and is the equilibrium state that the system S is forever falling towards 
without ever quite reaching, because of the mutation (noise) within the system. So, 
while this maximal state will never be reached, the maximum macro-state M max 
itself is certain to be reached, provided the system does not get trapped at local 
optima, i.e. the probability of being in the maximum macro-state M max at infinite 
time is one, PM max = 1? as defined from equation (28). 

Furthermore, we can define quantitatively the probability distribution of the 
macro-states that the system will occupy at infinite time. For a stable system, 
as defined by equation (30), the degree of instability, di nsi can be defined as the 
entropy of its probability distribution at infinite time, 



where TV is the number of possible states, and taking log to the base TV normalises 
the degree of instability. The degree of instability will range from zero (inclusive) 
and one (exclusive), because the maximum instability of one would only occur 
during the theoretical extreme scenario of a non- discriminating selection pressure. 



6 Diversity 

A definition for the self-organised diversity of an evolving agent population should 
define the optimal variability, of the agents and agent-sequences, that emerge over 
time, with no initial constraints from modelling approaches for the inclusion of 
pre-defined specific behaviour, but capable of representing the appearance of such 
behaviour should it occur. 

None of the proposed definitions are applicable for the self-organised diversity 
of an evolving agent population. The G-machine modelling (Crutchfield and 
Gornerup, 2006) is not applicable, because it is only defined within the context of 
pre-biotic populations. Neither is the Minimum Description Length (Barron et al., 
1998) principle or the Priigel-Bennett Shapiro formalism (Priigel-Bennett, 1997) 
suitable, because they necessitate the involvement of subjective human judgement 
at the critical stages of model or quantifier selection. Mean Field Theory is also 
not applicable because of the necessity of a neighbourhood model for defining 
interaction, and evolving agent populations lack a 2D or 3D metric space for such 
models. So, the only available neighbourhood model becomes a distance measure on 
a parameter space that measures dissimilarity. However, this type of neighbourhood 
model cannot represent the information-based interactions between the individuals 
of an evolving agent population. 

We suggest that the uniqueness of Digital Ecosystems makes the application 
of existing definitions inappropriate for the self-organised diversity, because while 




(32) 



X 
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we could extend a biology-centric definition for the self-organised complexity, and 
a computing-centric definition for the self-organised stability, we found neither 
of these approaches, or any other, appropriate for the self-organised diversity. 
The Digital Ecosystem being the digital counterpart of a biological ecosystem 
gives it unique properties. So, the evolving agent populations possess properties 
of both computing systems (e.g. agent systems) as well as biological systems 
(e.g. population dynamics), and the combination of these properties makes them 
unique. So, we will further consider the evolving agent populations to create a 
definition for their self-organised diversity. 

6.1 Evolving Agent Populations 

The self-organised diversity of an evolving agent population comes from the agent- 
sequences it evolves, in response to the selection pressure, seeded with agents and 
agent-sequences from the agent-pool of the habitat in which it is instantiated. 
The set of agents and agent-sequences available when seeding an evolving agent 
population is regulated over time by other evolving agent populations, instantiated 
in response to other user requests, leading to the death and migration of agents 
and agent-sequences, as well as the formation of new agent-sequence combinations. 
The seeding of existing agent-sequences provides a direction to accelerate the 
evolutionary process, and can also affect the self-organised diversity, for example, 
if only a proportion of any available global optima is favoured. So, the set of agents 
available when seeding an evolving agent population provides potential for the 
self-organised diversity, while the selection pressure of a user request provides a 
constraining factor on this potential. Therefore, the optimality of the self-organised 
diversity of an evolving agent population is relative to the selection pressure of the 
user request for which it was instantiated. 

While we could measure the self-organised diversity of individual evolving 
agent populations, or even take a random sampling, it will be more informative 
to consider their collective self-organised diversity. Additionally, given that the 
Digital Ecosystem is required to support a range of user behaviour, we can consider 
the collective self-organised diversity of the evolving agent populations relative to 
the global user request behaviour. So, when varying a behavioural property of the 
user requests according to some distribution, we would expect the corresponding 
property of the evolving agent populations to follow the same distribution. While 
not intending to prescribe the expected user behaviour of the Digital Ecosystem, 
we do wish to investigate whether the Digital Ecosystem can adapt to a range 
of user behaviour. So, we will consider Uniform, Gaussian (Normal) and Power 
Law distributions for the parameters of the user request behaviour. The Uniform 
distribution will provide a control, while the Normal (Gaussian) distribution will 
provide a reasonable assumption for the behaviour of a large group of users, and 
the Power Law distribution will provide a relatively extreme variation in user 
behaviour. 

We therefore simulated the Digital Ecosystem, varying aspects of the user 
behaviour according to different distributions, and measuring the related aspects 
of the evolving agent populations. This consisted of a mechanism to vary the 
user request properties of length and modularity (number of attributes per atomic 
service), according to Uniform, Gaussian (normal) and Power Law distributions, 
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and a mechanism to measure the corresponding application (agent-sequence) 
properties of size and number of attributes per agent. For statistical significance 
each scenario (experiment) was averaged from ten thousand simulation runs. 
We expect it will be obvious whether the observed behaviour of the Digital 
Ecosystem matches the expected behaviour from the user base. Nevertheless, we 
will also implement a chi-squared (x 2 ) test to confirm if the observed behaviour 
(distribution) of the agent-sequence properties matched the expected behaviour 
(distribution) from the user request properties. 

7 Simulation and Results 

We simulated the Digital Ecosystem, using our simulation from section 2 and 
(Briscoe, 2009). Including simulated populations of agent-sequences, [Ai, Ai, A 2 , 
...], which were evolved to solve user requests, seeded with agents from the agent- 
pool of 20 agents from the habitats in which they were instantiated. A dynamic 
population size was used to ensure exploration of the available combinatorial 
search space, increasing with the average size of the population's agent-sequences. 
The optimal combination of agents (agent-sequence) was evolved to the user 
request R, by an artificial selection pressure created by a fitness function generated 
from the user request R. An individual (agent) of the population consisted of a set 
of attributes, ai,<22, and a user request consisted of a set of required attributes, 

r i , r2 , So, the fitness function for evaluating an individual agent-sequence A, 

relative to a user request R, was 

fitness(A,R) = 1 + ^ 1 | r _ a | 7 ( 33 ) 

where a is the member of A such that the difference to the required attribute r was 
minimised. The abstract agent descriptions was based on existing and emerging 
technologies for semantically capable Service- Oriented Architectures (Rajasekaran 
et al., 2004), such as the OWL-S semantic markup for web services (Martin 
et al., 2004). We simulated an agent's semantic description with an abstract 
representation consisting of a set of attributes, to simulate the properties of 
a semantic description. Each attribute representing a property of the semantic 
description, ranging between one and a hundred. Each simulated agent was 
initialised with a semantic description of between three and six attributes, which 
would then evolve in number and content. 

Equation 33 was used to assign fitness values between 0.0 and 1.0 to each 
individual of the current generation of the population, directly affecting their 
ability to replicate into the next generation. The evolutionary computing process 
was encoded with a low mutation rate, fixed selection pressure and non-trapping 
fitness function (i.e. did not get trapped at local optima 4 ). The type of selection 
used fitness-proportional and non- elitist, fitness-proportional means that the fitter 
the individual the higher its probability of surviving to the next generation. Non- 
elitist means that the best individual from one generation was not guaranteed 
to survive to the next generation; it had a high probability of surviving into 
the next generation, but it was not guaranteed as it might have been mutated. 
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Crossover/recombination was then applied to a randomly chosen 10% of the 
surviving population. Mutations were then applied to a randomly chosen 10% 
of the surviving population; point mutations were randomly located, consisting 
of insertions (an agent was inserted into an agent-sequence), replacements (an 
agent was replaced in an agent-sequence), and deletions (an agent was deleted 
from an agent-sequence). The issue of bloat was controlled by augmenting the 
fitness function with a parsimony pressure to bias the search to smaller agent- 
sequences, evaluating larger than average agent-sequences with a reduced fitness, 
and therefore providing a dynamic control limit adaptive to the average size of the 
individuals of the ever-changing evolving agent populations. 

Our simulation included our extended Physical Complexity and Efficiency, 
which required implementing the Cy of (10), the ty of (18) and the Hy of (9) for 
the per-site entropies. The Efficiency E c (19), for populations with clusters, was 
also implemented in the simulation. It also included our extended Chli-DeWilde 
stability and degree of instability, which required calculating py) of (28) to estimate 
the stability, and p^ of (29) to prove the existence of p^ ^ Py from (30). The 
degree of instability, di ns of (32), was also implemented in the simulation. 

For the diversity experiments we included a way to vary aspects of the user 
behaviour according to different distributions, and a way to measure the related 
aspects of the evolving agent populations. This consisted of a mechanism to 
vary the user request properties of length and modularity, according to Uniform, 
Gaussian (normal) and Power distributions, and a mechanism to measure the 
corresponding agent (-sequence) properties of length and number of attributes. 
For statistical significance each scenario (experiment) will be averaged from ten 
thousand simulation runs. We expect it will be obvious whether the observed 
behaviour of the Digital Ecosystem matches the expected behaviour from the user 
base. Nevertheless, we will also implement a chi-squared (x 2 ) test to determine if 
the observed behaviour (distribution) of the agent (-sequence) properties matches 
the expected behaviour (distribution) from the user request properties. 

7. 1 Complexity 

Figure 12 shows, for a typical evolving agent population, the Physical Complexity 
Cy (10) for variable length sequences and the maximum fitness F max over the 
generations. It shows that the fitness and our extended Physical Complexity; both 
increase over the generations, synchronised with one another, until generation 160 
when the maximum fitness tapers off more slowly than the Physical Complexity. 
At this point the optimal length for the sequences is reached within the simulation, 
and so the advent of new fitter sequences (of the same of similar length) creates 
only minor fluctuations in the Physical Complexity, while having a more significant 
effect on the maximum fitness. It increases over the generations because of the 
increasing information being stored, with the sharp increases occurring when the 
effective length iy of the population increases. The temporary decreases, such 
as the one beginning at generation 138, are preceded by the advent of a new 
fitter mutant, as indicated by a corresponding sharp increase in the maximum 
fitness in the immediately preceding generations, which temporarily disrupt the 
self-organised complexity of the population, until this new fitter mutant becomes 
dominant and leads to a new higher level of self-organised complexity. The 
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Generation 

Figure 12 Graph of Physical Complexity and Maximum Fitness: The Physical 
Complexity for variable length sequences increases over the generations, 
showing short-term decreases as expected, such as at generation 138. 



similarity of the graph in Figure 12 to the graphs in (Adami et al., 2000) confirms 
that the Physical Complexity measure has been successfully extended to variable 
length sequences. 



7.1.1 Efficiency 

Figure 13 is a visualisation of the simulation, showing two alternate populations 
run for a thousand generations, with the one on the left from Figure 12 run under 
normal conditions, while the one on the right was run with a non-discriminating 
selection pressure. Each multi-coloured line represents an agent-sequence, while 
each colour represents an agent (site). The visualisation shows that our Efficiency 
E accurately measures the self-organised complexity of the two populations. It 
also shows significant variation in the population run under normal conditions, as 
the evolutionary computing process creates the opportunity to find fitter (better) 
sequences, providing potential to avoid getting trapped at local optima. 

Figure 14 shows the Efficiency E (11), over the generations, for the population 
from Figure 12. The Efficiency tends to a maximum of one, indicating that the 
population consists of one cluster, which is confirmed by the visualisation of 
the population in Figure 13 (left). The significant decreases that occurred in 
the Efficiency, reducing in magnitude and frequency over the generations, came 
from mirroring the fluctuations that occurred in the complexity CV, because 
the Efficiency E (11) is the complexity Cy (10) over the complexity potential 
CV P (6). These falls are caused by the creation of fitter (better) mutants within 
the population, which eventually become the dominant genotype, but during the 
process causes the Physical Complexity and the Efficiency to fall in the short-term. 
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Figure 13 Visualisation of Evolving Agent Populations at the 1000th Generation: 

The population on the left, from Figure 12, was run under normal conditions, 
while the other was run with a non-discriminating selection pressure. 




7.1.2 Clustering 



To further investigate the self-organisation of evolving agent populations, we 
simulated a typical population with a multi-objective selection pressure that had 
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Figure 15 Graph of the Clustering Coefficient: The Efficiency oscillated to 0.744, as 
expected from (15) given the alphabet size was fifteen, |D|=15, and the 
number of clusters was two, |T|=2, indicating more than one cluster. 



two independent global optima, and so the potential to support two pure clusters 
(each cluster using a unique subset of the alphabet D). The graph in Figure 
15 shows the Efficiency E over the generations acting as a clustering coefficient, 



Population: C v = 14.60 , %E = 73.2% , %E C = 98.0% 




Agent-sequence (length) 



Figure 16 Visualisation of Clusters in an Evolving Agent Population at the 1000th 
Generation: The agent-sequences were grouped to show the two clusters. 
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oscillating around the included best fit curve, quite significantly at the start, and 
then decreasing as the generations progressed. The Efficiency tended to 0.744, as 
expected from (15) given the alphabet size was fifteen, |D|=15, and the number 
of clusters was two, \T\=2. The tending itself indicated clustering, while the 
value it tended to indicated, as expected, the presence of two clusters in the 
population. The initial severe oscillations were caused by the creation and spread 
of fitter longer mutants (agent-sequences) in the population, causing the Physical 
Complexity and therefore the Efficiency to fluctuate significantly. A visualisation 
of the population is shown in Figure 16, in which the agent-sequences were 
grouped to show the two clusters. As expected from (18) each cluster had a much 
higher Physical Complexity and Efficiency compared to the population as a whole. 
However, the Efficiency E c is immune to the clusters and therefore calculated the 
self-organisation of the population correctly. 

7.1.3 Summary 

Collectively, the experimental results confirm that Physical Complexity has been 
successfully extended to evolving agent populations. Most significantly, Physical 
Complexity has been reformulated algebraically for populations of variable length 
sequences, which we have confirmed experimentally through simulations. Our 
Efficiency definition provides a macroscopic value to characterise the level of 
complexity. Furthermore, the clustering coefficient defined by the tending of the 
Efficiency, not only indicates clustering, but can also distinguish between a single 
cluster population and a population with clusters. The number of clusters can 
even be determined, for pure clusters, from the value to which the clustering 
coefficient tends. Combined, this allows the Efficiency E c definition to provide a 
normalised universally applicable macroscopic value to characterise the complexity 
of a population, independent of clustering, atomicity, length (variable or same), 
and size. 

7.2 Stability 

Our evolving agent population (a MAS with evolutionary dynamics) is stable if 
the distribution of the limit probabilities exists and is non-uniform, as defined 
by equations (29) and (30). The simplest scenario is a typical evolving agent 
population with a single global optimal solution, which is stable if there are at least 
two macro-states with different limit occupation probabilities. We shall consider 
the maximum macro-state M max and the sub-optimal macro-state M^aif- Where 
the states of the macro-state M max each possess at least one individual with global 
maximum fitness, 

while the states of the macro-state Mhaif each possess at least one individual with 
a fitness equal to half of the global maximum fitness, 



thereby fulfilling the requirements of equations (29) and (30). The sub-optimal 
macro-state M^aif^ having a lower fitness, we predict to be seen earlier in 
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the evolutionary process before disappearing as higher fitness macro-states are 
reached. The system S will take longer to reach the maximum macro-state M maxi 
but once it does will likely remain, leaving only briefly depending on the strength 
of the mutation rate, as the selection pressure is non-elitist. 

A value of t = 1000 was chosen to represent t = oo experimentally, because the 
simulation has often been observed to reach the maximum macro-state M max by 
500 generations. Therefore, the probability of the system S being in the maximum 
macro-state M max at the thousandth generation is expected to be one, pjj m ax = 1. 
Furthermore, the probability of the system being in the sub-optimal macro-state 
Mhaif at the thousandth generation is expected to be zero, p^ if = 0. We can 
therefore conclude that our extended Chli-DeWilde stability accurately models the 
stability over time of evolving agent populations. 

Figure 17 shows, for a typical evolving agent population, a graph of the 
probability as defined by equation (28) of the maximum macro-state M max and 
the sub-optimal macro-state Mhaif at each generation, averaged from ten thousand 
simulation runs to provide statistical significance. The behaviour of the simulated 
system S was as expected, being in the maximum macro-state M max only after 
generation 178 and always after generation 482. It was also observed being in 
the sub- optimal macro-state M^aif only between generations 37 and 113, with a 
maximum probability of 0.053 (3 d.p.) at generation 61, and was such because 
the evolutionary path (state transitions) could avoid visiting the macro-state. 
As we expected the probability of being in the maximum macro-state M max by 
the thousandth generation was one, Pm^ &x = 1, and so the probability of being 
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Figure 17 Graph of the Probabilities of the Macro-States: Mmax and Mhaif a t each 
Generation: The system S, a typical evolving agent population, was in the 
maximum macro-state Mmax only after generation 178 and always after 
generation 482. 
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in any other macro-state, including the sub-optimal macro-state M^ a if^ by the 
thousandth generation was zero, p]^ aU — 0. 

A visualisation for the state of a typical evolving agent population at the 
thousandth generation is shown in Figure 18, with each line representing an 
agent-sequence and each colour representing an agent, with the identical agent- 
sequences grouped for clarity. It shows that the evolving agent population reached 
the maximum macro- state M max and remained there, but as we expected never 
reached the maximal state of the maximum macro-state, where all the agent- 
sequences are identical and have maximum fitness, indicated by the lack of total 
uniformity in Figure 18. This was expected, because of the mutation (noise) within 
the evolutionary process, which is necessary to create the opportunity to find fitter 
(better) sequences and potentially avoid getting trapped at any local optima that 
may be present. We can therefore conclude that the macro-state interpretation of 
our extended Chli-DeWilde stability accurately models the state-space of evolving 
agent populations. 




M Agent-Sequence ► 

Figure 18 Visualisation of an Evolving Agent Population at the 1000th Generation: 
The population consists of multiple agent-sequences, with each line 
representing an agent-sequence, and so each colour representing an individual 
agent. 



7.3 Degree of Instability 

Given that our simulated evolving agent population is stable, as defined by 
equations (29) and (30), we can determine its degree of instability as defined by 
equation (32). So, calculated from its limit probabilities, the degree of instability 
was 

d ms = H(p wo °) = -Y,pT°Io9n(pT°) 

X 

= -llog N (l) 
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where t — 1000 is an effective estimate for t = oo, as explained earlier. The 
result was as we expected because the maximum macro-state M max by the 
thousandth generation was one, = 1, and so the probability of being in the 

other macro-states by the thousandth generation was zero. The system therefore 
shows no instability, as there is no entropy in the occupied macro-states at infinite 
time. We can therefore conclude that the degree of instability of our extended 
Chli-DeWilde stability can provide a macroscopic value to characterise the level of 
stability of evolving agent populations. 

7.4 Stability Analysis 

We then performed a stability analysis (akin to a sensitivity analysis (Cacuci et al., 
2003)) of a typical evolving agent population, varying its key parameters while 
measuring its stability. We varied the mutation and crossover rates from 0% to 
100% in 10% increments to provide a sufficient density of measurements to identify 
any trends that might be present, calculating the degree of instability, 5 from (32), 
at the thousandth generation. These degree of instability values were averaged over 
10 000 simulation runs to ensure statistical significance, and graphed against the 
mutation and crossover rates in Figure 19. It showed that the crossover rate had 
little effect on the stability of the simulated evolving agent population, whereas the 
mutation rate did have a significant affect on the stability. With the mutation rate 
under or equal to 60%, the evolving agent population showed no instability, with 
di ns values equal to zero as the system S was always in the same macro-state M 




Figure 19 Graph of Stability with Different Mutation and Crossover Rates: With the 
mutation rate under or equal to 60%, the evolving agent population showed 
no instability. 



38 



G Briscoe and P Be Wilde 



at infinite time, independent of the crossover rate. With the mutation rate above 
60% the was a significant increase in instability, with the system being in one of 
several different macro-states at infinite time; with a mutation rate of 70% its was 
still very stable, having low di ns values ranging between 0.08 and 0.16, but once 
the mutation rate was 80% or greater it became quite unstable, shown by high 
di ns values nearing 0.5. 

As one would expect, an extremely high mutation rate has a destabilising effect 
on the stability of an evolving agent population. Also, as expected the crossover 
rate had only a minimal effect, because variation from crossover was limited 
when the population had matured, consisting of agent-sequences identical or very 
similar to one another. It should also be noted that the stability is different to 
performance, because although showing no instability with mutation rates below 
60% (inclusive), it only reached the maximum macro-state M max with a mutation 
rate of 10% or above, while at 0% it was stable at a sub-optimal macro-state. We 
can therefore conclude that the degree of instability of our extended Chli-DeWilde 
stability can used of perform stability analyses (similar to a sensitivity analyses 
(Cacuci et al., 2003)) of evolving agent populations. 

7.4-1 Summary 

Collectively, the experimental results confirm that Chli-Dewilde stability has been 
successfully extended to evolving agent populations, while our definition for the 
degree of instability provides a macroscopic value to characterise the level of 
stability. 



7. 5 Diversity 

7. 5. 1 User Request Length 

We started by varying the user request length according to the available 
distributions, expecting the size of the corresponding applications (agent- 
sequences) to be distributed according to the length of the user requests, i.e. the 
longer the user request, the larger the agent-sequence needed in response. 

We first applied the Uniform distribution as a control, and graphed the results 
in Figure 20. The observed frequencies of the application (agent-sequence) size 
mostly matched the expected frequencies, which was confirmed with a % 2 test; with 
a null hypothesis of no significant difference and sixteen degrees of freedom, the x 2 
value was 2.588, below the critical 0.95 x 2 value of 7.962. 

We then applied the Gaussian distribution as a reasonable assumption for the 
behaviour of a large group of users, and graphed the results in Figure 21. The 
observed frequencies of the application (agent-sequence) size matched the expected 
frequencies with only minor variations, which was confirmed by a x 2 test; with a 
null hypothesis of no significant difference and sixteen degrees of freedom, the y 2 
value was 2.102, below the critical 0.95 \ 2 value of 7.962. 

Finally, we applied the Power Law distribution to represent a relatively 
extreme variation in user behaviour, and graphed the results in Figure 22. The 
observed frequencies of the application (agent-sequence) size matched the expected 
frequencies with some variation, which was confirmed by a y 2 test; with a null 
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Figure 20 Graph of Uniformly Distributed Agent- Sequence Length Frequencies: The 
observed frequencies of the application (agent-sequence) size mostly matched 
the expected frequencies, , which was confirmed by a \ 2 test; with a null 
hypothesis of no significant difference and sixteen degrees of freedom, the x 2 
value was 2.588, below the critical 0.95 \ 2 value of 7.962. 



80 
70 



Expected 
Observed 



60 



50 



o 
3 



40 



30 - 



20 



10 



2 3 4 5 6 



8 9 10 11 12 13 14 15 16 17 18 19 20 
Agent-sequence Length 



Figure 21 Graph of Gaussian Distributed Agent- Sequence Length Frequencies: The 
observed frequencies of the application (agent-sequence) size matched the 
expected frequencies with only minor variations, which was confirmed by a x 2 
test; with a null hypothesis of no significant difference and sixteen degrees of 
freedom, the x 2 value was 2.102, below the critical 0.95 % 2 value of 7.962. 
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Figure 22 Graph of Power Law Distributed Agent-Sequence Length Frequencies: The 
observed frequencies of the application (agent-sequence) size matched the 
expected frequencies with some variation, which was confirmed by a x 2 test; 
with a null hypothesis of no significant difference and sixteen degrees of 
freedom, the x 2 value was 5.048, below the critical 0.95 x 2 value of 7.962. 



hypothesis of no significant difference and sixteen degrees of freedom, the x 2 value 
was 5.048, below the critical 0.95 x 2 value of 7.962. 

There were a couple of minor discrepancies, similar to all the experiments. 
First, there were a small number of individual agents at the thousandth time 
step, caused by the typical user behaviour of continuously creating new agents 
(services). Second, while the chi-squared tests confirmed that there was no 
significant difference between the observed and expected frequencies of the 
application (agent-sequence) size, there was still a bias to larger applications 
(solutions). Evident visually in the graphs of the experiments, and evident 
numerically in the chi-squared test of the Power Law distribution experiment as 
it favoured smaller agent-sequences. The cause of this bias was most likely some 
aspect of bloat^not fully controlled. 

7.5.2 User Request Modularity 

Next, we varied the user request modularity (number of attributes per atomic 
service) according to the available distributions, expecting the sophistication of the 
agents to be distributed according to the modularity of the user requests, i.e. the 
more complicated (in terms of modular non-reducible tasks) the user request, the 
more sophisticated (in terms of the number of attributes) the agents needed in 
response. 

c When variable length representations of solutions are used, a well-known phenomenon 
arises, called bloat, in which the individuals of an evolving population tend to grow in size 
without gaining any additional advantage (Langdon and Poli, 1997). 
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We first applied the Uniform distribution as a control, and graphed the results 
in Figure 23. The observed frequencies for the number of agent attributes mostly 
matched the expected frequencies, which was confirmed by a x 2 test; with a null 
hypothesis of no significant difference and ten degrees of freedom, the x 2 value was 
1.049, below the critical 0.95 x 2 value of 3.940. 

We then applied the Gaussian distribution as a reasonable assumption for the 
behaviour of a large group of users, and graphed the results in Figure 24. The 
observed frequencies for the number of agent attributes again followed the expected 
frequencies, but there was variation which led to a failed x 2 test; with a null 
hypothesis of no significant difference and ten degrees of freedom, the x 2 value was 
50.623, not below the critical 0.95 x 2 value of 3.940. 

Finally, we applied the Power Law distribution to represent a relatively 
extreme variation in user behaviour, and graphed the results in Figure 25. The 
observed frequencies for the number of agent attributes also followed the expected 
frequencies, but there was variation which led to a failed x 2 test; with a null 
hypothesis of no significant difference and ten degrees of freedom, the x 2 value was 
61.876, not below the critical 0.95 x 2 value of 3.940. 

In all of the experiments the observed frequencies for the number of agent 
attributes followed the expected frequencies, with some variation in two of the 
experiments. Collectively, the experimental results confirm that the self-organised 
diversity of the evolving agent populations is relative to the selection pressures 
of the user base, which was confirmed statistically for most of the experiments. 
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Figure 23 Graph of Uniformly Distributed Agent Attribute Frequencies: The 

observed frequencies for the number of agent attributes mostly matched the 
expected frequencies, which was confirmed by a x 2 test; with a null 
hypothesis of no significant difference and ten degrees of freedom, the x 2 value 
was 1.049, below the critical 0.95 \ 2 value of 3.940. 
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Figure 24 Graph of Gaussian Distributed Agent Attribute Frequencies: The observed 
frequencies for the number of agent attributes again followed the expected 
frequencies, but there was variation, which led to a failed \ 2 test; with a null 
hypothesis of no significant difference and ten degrees of freedom, the x 2 value 
was 50.623, not below the critical 0.95 \ 2 value of 3.940. 



2000 



1500 



2 
— 



1000 



500 



Expected 
Observed 



8 9 10 11 12 
Number of Attributes per Agent 



13 



14 



15 



Figure 25 Graph of Power Law Distributed Agent Attribute Frequencies: The 
observed frequencies for the number of agent attributes also followed the 
expected frequencies, but there was variation, which led to a failed x 2 test; 
with a null hypothesis of no significant difference and ten degrees of freedom, 
the x 2 value was 61.876, not below the critical 0.95 \ 2 value of 3.940. 
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While the minor experimental failures, in which the Digital Ecosystem responded 
more slowly than in the other experiments, shows the potential to optimise the 
Digital Ecosystem, because the evolutionary self-organisation of an ecosystem is a 
slow process (Begon et al., 1996), even the accelerated form present in our Digital 
Ecosystem. 

7. 5. 3 Summary 

Collectively, the experimental results confirm that the self-organised diversity of 
the evolving agent populations is relative to the selection pressures of the user 
base, which was confirmed statistically for most of the experiments. So, we have 
determined an effective understanding and quantification for the self-organised 
diversity of the evolving agent populations of our Digital Ecosystem. While the 
minor experimental failures, in which the Digital Ecosystem responded more slowly 
than in the other experiments, have shown that there is potential to optimise the 
Digital Ecosystem, because the evolutionary self-organisation of an ecosystem is a 
slow process (Begon et al., 1996), even the accelerated form present in our Digital 
Ecosystem. 

8 Conclusions 

Overall an insight has been achieved into where and how self-organisation occurs 
in our Digital Ecosystem, and what forms this self-organisation can take and 
how it can be quantified. The hybrid nature of the Digital Ecosystem resulted 
in the most suitable definition for the self-organised complexity coming from the 
biological sciences, while the most suitable definition for the self-organised stability 
coming from the computer sciences. However, we were unable to use any existing 
definition for the self-organised diversity, because the hybrid nature of the Digital 
Ecosystem makes it unique, and so we constructed our own definition based on 
variation relative to the user base. The (Physical) complexity definition applies 
to a single point in time of the evolving agent populations, whereas the (Chli- 
DeWilde) stability definition applies at the end of these instantiated evolutionary 
processes, while our diversity definition applies to the optimality of the distribution 
of the agents within the evolving agent populations of the Digital Ecosystem. The 
experimental results have generally supported the hypotheses, and have provided 
more detail to the behaviour of the self-organising phenomena under investigation, 
showing some of its properties and for the self-organised diversity has shown 
that there is potential for optimising the Digital Ecosystem, which we have since 
considered here (Briscoe, 2009; Briscoe and De Wilde, 2008). 

8. 1 Complexity 

We investigated the self-organised complexity of evolving agent populations 
through experimental simulations, for which our extended Physical Complexity 
was consistent with the original. We then investigated the Efficiency, which 
performed as expected, confirmed by the numerical results and population 
visualisations matching our intuitive understanding. We then applied the 
Efficiency to the determination of clusters when subjecting an evolving agent 
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population to a multi-objective selection pressure. The numerical results, combined 
with the visualisation of the multi-cluster population, confirmed the ability of the 
Efficiency to act as a clustering coefficient, not only indicating the occurrence of 
clustering, but also the number of clusters (for pure clusters). We also confirmed 
that the Efficiency E c for populations with clusters was able to calculate correctly 
the self-organised complexity of evolving agent populations with clusters. 

We have determined an effective understanding and quantification for the 
self-organised complexity of the evolving agent populations of our Digital 
Ecosystem. Furthermore, the understanding and techniques we have developed 
have applicability beyond evolving agent populations, as wide as the original 
Physical Complexity, which has been applied from DNA (Adami and Cerf, 2000) 
to simulations of self-replicating programmes (Lenski et al., 2003). 

8.2 Stability 

We then investigated the self-organised stability of evolving agent populations 
through experimental simulations, and the results showed that there was a limit 
probability distribution, and that it was non-uniform. Furthermore, the reaching 
of the maximum macro-state was confirmed by a visualisation matching the 
numerical results. We then applied our degree of instability to determine that there 
was no instability under normal conditions, and then performed a stability analysis 
(similar to a sensitivity analysis (Cacuci et al., 2003)) showing the variation of the 
self-organised stability under varying conditions. 

We have determined an effective understanding and quantification for the 
self-organised stability of the evolving agent populations of our Digital Ecosystem. 
Also, our extended Chli-DeWilde stability is applicable to other Multi-Agent 
Systems with evolutionary dynamics. Furthermore, our degree of instability 
provides a definition for the level of stability, applicable to Multi- Agent Systems 
with or without evolutionary dynamics. 

8. 3 Diversity 

We then investigated the self-organised diversity of evolving agent populations 
through experimental simulations. First, varying the user request length according 
to the different distributions, and testing whether the observed frequencies of the 
agent-sequence length matched the expected frequencies, which we confirmed with 
successful chi-squared tests. Second, varying the user request modularity according 
to the different distributions, and testing whether the observed frequencies for the 
number of agent attributes matched the expected frequencies, again confirming 
with chi-squared tests. Under the Gaussian and Power distributions the chi- 
squared tests failed, most likely because the evolving agent populations were 
still self-organising to match the user behaviour, because at the time the Digital 
Ecosystem was sampled each user had placed an average of only ten requests. 
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